34 research outputs found

    Hyracks: A flexible and extensible foundation for data-intensive computing

    Full text link
    Abstract—Hyracks is a new partitioned-parallel software plat-form designed to run data-intensive computations on large shared-nothing clusters of computers. Hyracks allows users to express a computation as a DAG of data operators and connec-tors. Operators operate on partitions of input data and produce partitions of output data, while connectors repartition operators’ outputs to make the newly produced partitions available at the consuming operators. We describe the Hyracks end user model, for authors of dataflow jobs, and the extension model for users who wish to augment Hyracks ’ built-in library with new operator and/or connector types. We also describe our initial Hyracks implementation. Since Hyracks is in roughly the same space as the open source Hadoop platform, we compare Hyracks with Hadoop experimentally for several different kinds of use cases. The initial results demonstrate that Hyracks has significant promise as a next-generation platform for data-intensive applications. I

    AsterixDB: A Scalable, Open Source BDMS

    Full text link
    AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements

    Inverse Functions in the AquaLogic Data Services Platform ABSTRACT

    No full text
    When integrating data from heterogeneous sources, it is often necessary to transform both the schemas and the data from the underlying sources in order to present the integrated data in the form desired by its consuming applications. Unfortunately, these transformations—particularly if implemented by custom code—can block query optimization and updates, leading to potentially severe performance and functionality limitations. To circumvent these problems, the BEA AquaLogic Data Services Platform provides support for user-defined inverse functions. This paper describes the motivation, design, user experience, and implementation associated with inverse functions in ALDSP. This functionality debuted in version 2.1 of ALDSP in March 2006. 1

    Uncovering the full potential of data services

    No full text
    Making use of available services when building Web applications is a major challenge for today's developers. I address this challenge by using a declarative interface for data-centric Web services (aka data services), which are published as queries over a source schema. Programmers simply write queries over the source schema and rely on the system to automatically translate them to calls to existing data services. Thus, programmers can focus on extracting the data they need, without having to understand the definition or the implementation of each individual service. This dissertation discusses the main underlying technical problem, that of deciding whether a query can be translated into service calls. We consider two settings: when the system cannot do any post- processing and hence can issue only one service call (I call that expressibility) and when it is able to issue several calls and combine the results (I call it support). Expressibility and support are studied both for services that are listed individually and for compactly represented services (using grammar-like or Datalog formalisms). I also present contributions to extending the underlying service infrastructure with new features, several of which were added to the Distributed XQuery (DXQ) framework. DXQ is an XML query and scripting language with support for side effects, distribution, parallelism, which I also used as implementation platform for workflow languages

    XQuery at Your Web Service

    No full text
    XML messaging is at the heart of Web services, providing the flexibility required for their deployment, composition, and maintenance. Yet, current approaches to Web services development hide the messaging layer behind Java or C# APIs, preventing the application to get direct access to the underlying XML information. To address this problem, we advocate the use of a native XML language, namely XQuery, as an integral part of the Web services development infrastructure. The main contribution of the paper is a binding between WSDL, the Web Services Description Language, and XQuery. The approach enables the use of XQuery for both Web services deployment and composition. We present a simple command-line tool that can be used to automatically deploy a Web service from a given XQuery module, and extend the XQuery language itself with a statement for accessing one or more Web services. The binding provides tight-coupling between WSDL and XQuery, yielding additional benefits, notably: the ability to use WSDL as an interface language for XQuery, and the ability to perform static typing on XQuery programs that include Web service calls. Last but not least, the proposal requires only minimal changes to the existing infrastructure. We report on our experience implementing this approach in the Galax XQuery processor

    ABSTRACT

    No full text
    We present DXQ, an extension of XQuery to support the effective and efficient development of distributed XML applications. A DXQ program can invoke remote DXQ programs both synchronously and asynchronously and can dynamically ship DXQ code to execute at remote servers. We illustrate the power of the language with two distributed applications: the resolution algorithm of the Domain Name System (DNS) and the Narada overlay-network protocol. Our implementation permits concurrent evaluation of DXQ expressions at each server and can produce results extensionally (as XML values) or intensionally (as DXQ expressions). 1
    corecore